Responding to Human Spoken Audio Based on User Input

ABSTRACT

Systems and methods for responding to human spoken are provided herein. Exemplary methods may include receiving audio input for generating a speech signal using at least one microphone communicatively coupled to an intelligent assistant device. The method may also include transmitting the audio input from the intelligent assistant device to a natural language processor, the audio input having been converted from speech to a text query. The method may further include processing the text query using artificial intelligence (AI) logic, determining an Application Programming Interface (API) from a plurality of APIs for processing the text query, and transmitting a response from the API to the intelligent assistant device or another device communicatively coupled to the intelligent assistant device for output.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/734,282, filed on Jan. 4, 2013 and entitled “Systems and Methods forResponding to Human Spoken Audio,” which claims the benefit of U.S.Provisional Patent Application Ser. No. 61/584,752, filed on Jan. 9,2012, and entitled “System and Methods for Responding to Human SpokenAudio Using a Natural Language Processor.” All of the above applicationsare hereby incorporated herein by reference in their entirety includingall references cited therein.

FIELD OF THE INVENTION

The present invention relates generally to systems and methods ofresponding to human spoken audio, and more specifically, to systems andmethods that interpret human spoken audio and then generate a responsebased on the interpretation of the human spoken audio.

SUMMARY OF THE PRESENT TECHNOLOGY

According to some embodiments, the present technology may be directed tomethods that comprise: receiving audio input for generating a speechsignal using at least one microphone communicatively coupled to anintelligent assistant device; transmitting the audio input from theintelligent assistant device to a natural language processor; convertingthe audio input from speech to a text query using the natural languageprocessor; processing the text query using artificial intelligence (AI)logic using the natural language processor; determining an ApplicationProgramming Interface (API) from a plurality of APIs for processing thetext query using the natural language processor; and transmitting aresponse from the API to the intelligent assistant device or anotherdevice communicatively coupled to the intelligent assistant device foroutput using the natural language processor.

According to some embodiments, the present technology may be directed toa system that comprises: an intelligent assistant device comprising aprocessor which executes logic to perform operations comprising:receiving audio input for generating a speech signal using at least onemicrophone communicatively coupled to the intelligent assistant device;and a natural language processor communicatively coupled with theintelligent assistant device that executes logic to perform operationscomprising: receiving the audio input from the intelligent assistantdevice; converting the audio input from speech to a text query;processing the text query using artificial intelligence (AI) logic;determining an Application Programming Interface (API) from a pluralityof APIs for processing the text query; and transmitting a response fromthe API to the intelligent assistant device or another devicecommunicatively coupled to the intelligent assistant device for output.

BRIEF DESCRIPTION OF THE DRAWINGS

Certain embodiments of the present technology are illustrated by theaccompanying figures. It will be understood that the figures are notnecessarily to scale and that details not necessary for an understandingof the technology or that render other details difficult to perceive maybe omitted. It will be understood that the technology is not necessarilylimited to the particular embodiments illustrated herein.

FIG. 1 is a system for processing human spoken audio, in accordance withembodiments of the present invention;

FIG. 2 illustrates a flowchart of processing human spoken audio, inaccordance with embodiments of the present invention;

FIG. 3 illustrates a display of interactions utilizing a device commandinterpreter, in accordance with embodiments of the present invention;

FIG. 4 illustrates a front perspective view of an intelligent assistantdevice, in accordance with embodiments of the present invention;

FIG. 5 illustrates a rear perspective view of an intelligent assistantdevice, in accordance with embodiments of the present invention;

FIG. 6 illustrates an overhead view of an intelligent assistant device,in accordance with embodiments of the present invention;

FIG. 7 illustrates side views of an intelligent assistant device, inaccordance with embodiments of the present invention;

FIG. 8 illustrates another front perspective view of an intelligentassistant device, in accordance with embodiments of the presentinvention;

FIG. 9 provides a block diagram of components of an intelligentassistant device, in accordance with embodiments of the presentinvention;

FIG. 10 is a perspective view of an exemplary intelligent assistantdevice;

FIG. 11 is a perspective view of another exemplary intelligent assistantdevice;

FIG. 11A is a schematic diagram of an intelligent assistant device;

FIGS. 12A-G collectively illustrate a flow of data through an exemplarysystem architecture;

FIG. 13 illustrates an exemplary computing system that may be used toimplement embodiments according to the present technology; and

FIGS. 14A and 14B collectively include various views of anotherexemplary intelligent assistant device.

DESCRIPTION OF EXEMPLARY EMBODIMENTS

While this technology is susceptible of embodiment in many differentforms, there is shown in the drawings and will herein be described indetail several specific embodiments with the understanding that thepresent disclosure is to be considered as an exemplification of theprinciples of the technology and is not intended to limit the technologyto the embodiments illustrated.

The terminology used herein is for the purpose of describing particularembodiments only and is not intended to be limiting of the invention. Asused herein, the singular forms “a”, an and the are intended to includethe plural forms as well, unless the context clearly indicatesotherwise. It will be further understood that the terms “comprises”and/or “comprising,” when used in this specification, specify thepresence of stated features, integers, steps, operations, elements,and/or components, but do not preclude the presence or addition of oneor more other features, integers, steps, operations, elements,components, and/or groups thereof.

It will be understood that like or analogous elements and/or components,referred to herein, may be identified throughout the drawings with likereference characters. It will be further understood that several of thefigures are merely schematic representations of the present technology.As such, some of the components may have been distorted from theiractual scale for pictorial clarity.

The present technology provides hardware and software components thatinteract interpret and respond to human spoken audio. In someembodiments, the hardware components include a microphone that receivesaudio comprising human spoken audio. The audio that comprises humanspoken audio may in some instances be transmitted to a cloud computingcluster (e.g., cloud-based computing environment for processing. Ingeneral, a cloud-based computing environment is a resource thattypically combines the computational power of a large grouping ofprocessors and/or that combines the storage capacity of a large groupingof computer memories or storage devices. For example, systems thatprovide a cloud resource may be utilized exclusively by their owners; orsuch systems may be accessible to outside users who deploy applicationswithin the computing infrastructure to obtain the benefit of largecomputational or storage resources.

The cloud may be formed, for example, by a network of web servers, witheach web server (or at least a plurality thereof) providing processorand/or storage resources. These servers may manage workloads provided bymultiple users (e.g., cloud resource customers or other users).Typically, each user places workload demands upon the cloud that vary inreal-time, sometimes dramatically. The nature and extent of thesevariations typically depend on the type of business associated with theuser.

With respect to the present technology, the audio commands that comprisehuman spoken audio may be processed to clarify the human spoken audiocomponents from other audio aspects that may have also been recorded,such as background noise. In some instances, the present technology mayutilize digital signal process beam-forming microphone assembly, whichis included in an end user device. In other embodiments, various digitalsignal processes may be utilized at the cloud level to remove backgroundnoise or other audio artifacts. The processed human spoken audio maythen be transmitted to a text processor. The text processor usesspeech-to-text software (such as from Nuance®) and converts the humanspoken audio into a string of text that represents the human spokenaudio (hereinafter, “string of text”).

Once the human spoken audio has been processed into a string of text,the text processor may then return the string of text to a processingserver, which then transmits the string of text to a natural languageprocessor. The terms “natural language processor” may include, but isnot limited to any system, process, or combination of systems andmethods that evaluate, process, parse, convert, translate or otherwiseanalyze and/or transform natural language commands. For example, anexemplary natural language processor may convert natural languagecontent from an audio-format into a text format (e.g., speech to text),and may also evaluate the content for sentiment, mood, context (e.g.,domain), and so forth. Again, these natural language commands mayinclude audio format and/or text format natural language content, aswell as other content formats that would be known to one of ordinaryskill in the art.

At the natural language processor, the string of text may be broken downinto formal representations such as first-order logic structures thatcontain contextual clues or keyword targets. These contextual cluesand/or keyword targets are then used by a computer system to understandand manipulate the string of text. The natural language processor mayidentify the most likely semantics of the string of text based on anassessment of the multiple possible semantics which could be derivedfrom the string of text.

It will be understood that in some embodiments, one or more of thefeatures described herein such as noise reduction, natural languageprocessing, speech-to-text services (and vice versa), text parsing, andother features described herein may be executed at the device level(e.g., on the intelligent assistant device). In other instances, many orall of the aforementioned features may be executed at the cloud level,such that the intelligent assistant device receives audio commands andreturns responses. Thus, most or all of the processing of the audiocommands may occur at the cloud level. In some instances, theintelligent assistant device and the cloud may share processing dutiessuch that the intelligent assistant device executes some of the featuresof the present technology and the cloud executes other processes. Thiscooperative content processing relationship between the intelligentassistant device and the cloud may function to load balance the dutiesrequired to process audio commands and return responses to the end user.

Based on the computer system's understanding of the semantics of thestring of text, data from the string of text will be prepared fordelivery to an appropriate application program interface (API). Forexample, a string of text comprising the query, “What's the weather looklike today in Los Angeles, Calif.?” may be processed by the computersystem and distributed to a weather API. Further, the weather API mayprocess the data from the string of text to access a weather forecastfor Los Angeles, Calif. associated with the day that the query wasasked.

Since APIs may have different data structure requirements for processingqueries, one aspect of the natural language processor may be formattingthe data from the string of text to correspond to the data structureformat of an API that has been determined to be appropriate.

Once an API has processed the query data derived from human spokenaudio, an API response may be generated. This API response may then beconverted into an appropriate pre-spoken text string, also referred toas a fulfillment. Further, the API response may be recorded in adatabase and then converted to a speech response. Once the API responsehas been converted to a speech response, the speech response may bedistributed to a hardware component to playback the API speech response.The API response may also be saved and/or paired with the query data andsaved in a cache for efficient lookup. That is, rather than processingthe same query data multiple times, the present technology may obtainpreviously generated responses to identical or substantially identicalquery data from the cache. Temporal limitations may be placed upon theresponses stored in the cache. For example, responses for queriesregarding weather may be obtained from the cache only for relevantperiods of time, such as an hour.

In some embodiments, a hardware unit (e.g., intelligent assistantdevice) may act as a base station that is connected over a Wi-Fi networkto both the natural language processor as well as other enabled devices.For instance, other enabled devices may act as a microphone,transmitting human spoken audio via the base station to the naturallanguage processor or transmitting human spoken audio directly to theintelligent assistant device and then to the natural language processor.Additionally, enabled devices may also receive commands from a generalserver based on interpretation of the human spoken data using thenatural language processor. For instance, a Wi-Fi enabled thermostat mayreceive a command from the base station when the natural languageprocessor has interpreted human spoken audio to request an increase intemperature within a room controlled by the user device that receivedthe human spoken audio data. In some instances, the intelligentassistant device may utilize other communication media including, butnot limited to, Bluetooth, near field communications, RFID, infrared, orother communicative media that would be known to one of ordinary skillin the art with the present disclosure before them.

In accordance with the discussion above, FIG. 1 is a system 100 forprocessing human spoken audio, in accordance with embodiments of thepresent invention. In particular, a user spoken question or command 105is received at microphone 110. The user spoken question or command iswithin audio data. As such, the audio data comprises human spoken audio.The audio data may then be transmitted to hardware device 115 (such asan intelligent assistant device) or connected hardware device 120, suchas a cellular phone or digital music player. If the audio data istransmitted to connected hardware device 120, the audio data may then befurther transmitted to hardware device 115 after being received atconnected hardware device 120. Alternatively, hardware device 115 maycomprise microphone 110 such that hardware device 115 is able to recorduser commands and distribute recorded user commands to a speech and textsystem for initial processing.

From hardware device 115, audio 125 and user identification information130 are transmitted to a server 135, also referred to a commandprocessing server. The audio may be cleaned at the server 135. Inparticular, a combination of a microphone array (e.g., using audiocaptured by multiple microphones), beam-forming, noise-reduction, and/orecho cancellation system may be used to clean up the audio received toremove audio characteristics that are not human spoken audio. Once audiois received at the server 135, the audio is provided to a speech-to-textservice 140.

At the speech-to-text service 140, the audio is converted to a string oftext that represents human spoken audio. In embodiments, the string oftext may be stored in a database 142. In particular, database 142 may beused for storage of user data and for the storage of processed queries.Further, database 142 may be used to manage learned behaviors based onunique hardware identification.

The string of text may then be transmitted to a natural languageprocessor 145. Natural language processor 145 may parse unstructuredtext documents and extract key concepts such as context, category,meaning, and keywords. Additionally, the natural language processor 145may comprise artificial intelligence (AI) logic to process the string oftext into a discernible query. Further, natural language processor 145may utilize machine learning and/or a neural network to analyze thestring of text. For example, natural language processor 145 may utilizea method of interpreting and learning from patterns and behaviors of theusers and attributing data to such behaviors.

Further, natural language processor 145 may be run using a server systemused to run natural language processor 145 and a neural network.Additionally, the natural language processor 145 may determine whichquery API 150 is most appropriate to receive the query associated withthe string of text. Further, once a query API 150 is determined, thenatural language processor 145 may modify the query to comply with thestructure of queries appropriate to the determined query API 150.

The query generated at the natural language processor 145 is thenprovided to a query API 150. An exemplary API may comprise a variety ofopen source or licensed APIs that are used to take natural languageprocessor output and retrieve the necessary data. The query API 150processes the query and provides a query response to server 135. Oncethe query response is received at server 135, the query response may betransmitted to a format response component 155. The format responsecomponent 155 may comprise, for example, a text-to-speech translator. Inparticular, a text-to-speech translator may comprise a system used totake the national language processor output in text format and output inspoken audio. The answer 160 may then be provided to hardware device115, such as via a device interface comprising a process of returningthe spoken audio, from the text-to-speech component to hardware device115. From hardware device 115, the answer may be transmitted through aspeaker 165 to a system spoken audio response 170.

FIG. 2 illustrates a flowchart for processing human spoken audio, inaccordance with embodiments of the present invention. At 202, a customerspeaks a first trigger command, “Hello, ivee.” It will be understoodthat the trigger command may be end user defined. At 204, a microphonecaptures the first trigger command, which may then be transmitted todevice 210. The first trigger command may also be referred to as aninitiating command. An initiating command may prompt the device to readyitself for a subsequent audio command.

In another embodiment, a customer speaks a second trigger command (alsoreferred to as an audio command) such as, “Weather Los Angeles,” at 206.In some instances, the audio command may comprise a natural language orspoken word command, such as the audio command at 206. The secondtrigger command is captured at 208 by one or more microphones andtransmitted to device 210. Device 210 is coupled with a private API 230via WiFi connection 222. Further, device 210 provides an audio query 224to private API 230. In particular, audio query 224 is derived from anaudio command.

Audio query 224 may then be provided to a speech/text processor 232,which translates the audio query 224 into text query “Weather LosAngeles” 226. Text query “Weather Los Angeles” 226 is then provided toprivate API 230, which directs text query “Weather Los Angeles” 226 toan AI Logic component 234. AI Logic component 234 then provides textquery “Weather Los Angeles” 226 to a third party API 236. For example,for the text query “Weather Los Angeles” 226, an appropriate third partyAPI 236 may be a weather API. Third party API 236 then generates textanswer “76 Degrees” 228.

Text answer “76 Degrees” 228 may then be provided to AI Logic component234. Further, text answer “76 Degrees” 228 may then be transmitted fromAI Logic component 234 to private API 230. Further, text answer “76Degrees” 228 is provided to a speech/text processor 232 where textanswer “76 Degrees” 228 is translated to audio answer 238. Audio answer238 may then be provided to private API 230 and then provided to device210. From device 210, audio answer 238 is output as an audio “76Degrees” 240. In particular, audio response “76 Degrees” 240 is inresponse to audio command “Weather Los Angeles” 204. Further, audio“Command Please” 242 is in response to the initiating command “Hello,ivee” 202.

FIG. 3 illustrates a display 300 of interactions utilizing a devicecommand interpreter 305, in accordance with embodiments of the presentinvention. For example, device command interpreter 305 interacts with aninterface with a device 310. In particular, device command interpreter305 receives a command from the interface with the device 310. Devicecommand interpreter 305 also provides commands to the interface with thedevice 310. Further, device command interpreter 305 interacts with atext/speech processor 325. In particular, device command interpreter 305may provide a request for text-to-speech translation by providing astring of text to a text-to-string (“TTS”) component 315 of thetext/speech processor 325. Additionally, device command interpreter 305may provide a request for speech-to-text translation by providing avoice file to a speech-to-text component 320 of the text/speechprocessor 325. Further, device command interpreter 305 may receivescenario information from scenario building component 330.

Additionally, device command interpreter 305 is also communicativelycoupled with language interpreter 335. In particular, device commandinterpreter 305 may provide a sentence with scenario information tolanguage interpreter 335. Further, the language interpreter 335 maygenerate analyzed sentence information and send the analyzed sentenceinformation with scenario information to a decision making engine 340.The decision making engine 340 may select a most appropriate action.Further, the decision making engine 340 may utilize user accentreferences from a voice database 345. Based on the analyzed sentenceinformation and the scenario information, the decision making engine 340may generate a selected most appropriate action from scenarios and,further, may provide the selected most appropriate action to the devicecommand interpreter 305.

The device command interpreter 305 may also send a request to buildsentence information to a sentence generator component 350. In response,the sentence generator component 350 may provide a built sentence stringto device command interpreter 305. Additionally, the device commandinterpreter 305 may request service on servers by providing a servicerequest to an add on service interface 355. The add on service interface355 may provide the service request to a voice database web server 360.Further, a response generated by voice database web server 360 may beprovided to the device command interpreter 305 via the add on serviceinterface 355.

Further, device command interpreter 305 may interact with a userdatabase 370 via a user information database 365. In particular, devicecommand interpreter 305 may provide user information and deviceauthentication to the user database 370 via an interface of the userinformation database 365. Additionally, device command interpreter 305may interact with a streaming interface 380 of a device viacommunications module 375. In particular, device command interpreter 305may provide a file for download and/or text-to-speech voice data to filedownloader. Communications module 375 may include any datacommunications module that is capable of providing control of datastreaming processes to the streaming interface 380 of the device. Inresponse, the streaming interface 380 of the device may provide a datastream to communications module 375. The communications module 375 mayprovide a voice streaming up of device command interpreter 305.

FIG. 4 illustrates a front perspective view 400 of an intelligentassistant device, in accordance with embodiments of the presentinvention. In particular, intelligent assistant device comprises ascreen 405, a frame 410, and a device stand 415. Further, FIG. 5illustrates a rear perspective view 500 of an intelligent assistantdevice, in accordance with embodiments of the present invention. Inparticular, intelligent assistant device comprises a speaker 505, inputslots 510, device stand 515, and button 520.

FIG. 6 illustrates an overhead view 600 of an intelligent assistantdevice, in accordance with embodiments of the present invention. Inparticular, the intelligent assistant device comprises an audio button605, a snooze button 610, a mode button 615, and an intelligentassistant device stand 620. FIG. 7 illustrates side views 705 a and 705b of an intelligent assistant device, in accordance with embodiments ofthe present invention. In particular, the intelligent assistant devicecomprises buttons 710 and intelligent assistant device stand 715.

FIG. 8 illustrates another front perspective view 800 of an intelligentassistant device, in accordance with embodiments of the presentinvention. In particular, the intelligent assistant device comprises ascreen 805, a frame 810, and a device stand 815. Further, screen 805comprises a city indicator; a weather indicator; a date indicator; atime indicator; an alarm indicator; a message indicator; and a batteryindicator. It will be understood that the screens are merely exemplaryand other indicators may also likewise be utilized in accordance withthe present technology. In some instances, the indicators utilized inscreen 805 may relate to the types or domains of natural languagequeries that may be processed by the device.

FIG. 9 provides a block diagram 900 of components of an intelligentassistant device, in accordance with embodiments of the presentinvention. In particular, FIG. 9 comprises microphones 902 which provideaudio data to an audio processor module 904. Audio processor module 904provides analog data to a sensory natural language processor 906.Further, audio processor module 904 provides an analog or SPI (SerialPeripheral Interface) signal to a processor 908, where processorcomprises a main Atmel chip. Further, light sensor 910 and temperaturesensor 912 also provide data to processor 908. Buttons and/or switches914 also provide data to processor 908 via a touch sensor controller916. Additionally, data processor 908 is communicated between processor908 and sensory natural language processor 906. Sensory natural languageprocessor 906 is also coupled with an external memory for firmware 918.

Processor 908 also exchanges information with a Fast Super-TwistedNematic Display (FSTN) Liquid Crystal Display (LCD) display module withdriver 920, as well as a WiFi module 922. Further, processor 908 iscommunicatively coupled with an Erasable Programmable Read-Only Memory(EEPROM) 924 for user information and/or settings. Processor 908 is alsocommunicatively coupled with radio module 926 and audio mux 928. Audiomux 928 is an audio amplifier chip. Audio mux 928 also receives datafrom aux audio input 930. Further, sensory natural language processor906 also provides data to audio mux 928. Additionally, audio mux 928provides data to audio amp 932 and stereo speaker 934. FIG. 9 alsocomprises, for example, a USB Jack (or other similar communicativeinterface) for recharging 936 that provides rechargeable battery 938.

In addition to the embodiments described above, another exemplaryembodiment may utilize a plurality of microphones of a smartphone baseto implement a natural language processor, in accordance withembodiments of the present invention. In particular, audio is receivedat the plurality of microphones at a smartphone base. The audio isreceived at an application running a natural language processor, such asnatural language processor 145. Further, the application comprises aclean-up component that utilizes a combination of a microphone array,beam-forming, noise-reduction, and/or echo cancellation system to cleanup the audio received. In particular, the clean-up component may removenon-human spoken audio and/or may remove garbled human spoken audio(e.g., background conversations) that do not comprise primary humanspoken audio (e.g., the human spoken audio of the primary user). Byusing this process, a user can interact with a smartphone applicationfrom approximately ten feet away and closer. As such, by using audioclean-up processes such as beamforming, audio received from microphonesof auxiliary hardware devices, such as a dock for smartphone devices,may be used to interact with an application that comprises a naturallanguage processor, such as natural language processor 145.

FIG. 10 is a perspective view of an exemplary intelligent assistantdevice, which includes a base station in combination with a clock. Theintelligent assistant device may include any of the natural languageprocessing and third party information features described above inaddition to features commonly utilized with alarm clocks. Thus, thealarm clock may be controlled by the features and operations of thepersonal digital assistant device associated therewith. FIG. 11 is aperspective view of another exemplary intelligent assistant device,which includes a sleek and uni-body construction.

FIG. 11A is a schematic diagram of various components of an intelligentassistant device, for use with any of the intelligent assistant deviceproducts described herein.

FIGS. 12A-G collectively illustrate an exemplary flow diagram of datathrough an exemplary system that includes an intelligent assistantdevice 1200. In FIG. 12A, an intelligent assistant device 1200 may becommunicatively coupled with various systems through a client API 1205.More specifically, the intelligent assistant device 1200 maycommunicatively couple with a speech processor 1210 of FIG. 12B, whichin turn, couples with an external speech recognition engine 1215, insome instances. Again, the intelligent assistant device 1200 may includean integral speech recognition application.

A frames scheduler may be utilized to schedule and correlate responseswith other objects such as advertisements.

The intelligent assistant device 1200 may also communicatively couplewith a notifications server 1220 as shown in FIG. 12C. The notificationsserver 1220 may cooperate with the frames scheduler and anadvertisements engine to query relevant advertisements and integrate thesame into a response, which is returned to the intelligent assistantdevice 1200.

As shown in FIG. 12D, the system may utilize a command fulfiller 1225that creates API requests and processes responses to those requests.Additionally, the command fulfiller 1225 may also generate returnresponse objects. The command fulfiller 1225 may communicatively couplewith the speech processor 1210 of FIG. 12B, as well as varioussub-classes of command fulfillers 1230. These sub-classes of commandfulfillers 1230 may query third party information sources, such as anexternal knowledge engine 1235. The sub-classes of command fulfillers1230 may be domain specific, such as news, weather, and so forth.

FIG. 12E illustrates the use of a frame generator 1240 that processesinformation obtained by the sub-classes of command fulfillers 1230 ofFIG. 12D. Additionally, a plug-in framework for third party applicationsmodule 1245 is shown. This module 1245 allows for communicative couplingand interfacing of third party applications 1265 of FIG. 12G, via adeveloper API 1270.

Additionally, a user management system 1250 may allow for end user setupof the intelligent assistant device 1200 via an end user. The end usermay utilize a web-based portal 1255 of FIG. 12F that allows for the enduser to setup and manage their device via a device management API 1260.

FIG. 13 illustrates an exemplary computing system 1300 that may be usedto implement an embodiment of the present systems and methods. Thesystem 1300 of FIG. 13 may be implemented in the contexts of the likesof computing systems, networks, servers, or combinations thereof. Thecomputing system 1300 of FIG. 13 includes one or more processors 1310and main memory 1320. Main memory 1320 stores, in part, instructions anddata for execution by processor 1310. Main memory 1320 may store theexecutable code when in operation. The system 1300 of FIG. 13 furtherincludes a mass storage device 1330, portable storage device 1340,output devices 1350, user input devices 1360, a display system 1370, andperipheral devices 1380.

The components shown in FIG. 13 are depicted as being connected via asingle bus 1390. The components may be connected through one or moredata transport means. Processor unit 1310 and main memory 1320 may beconnected via a local microprocessor bus, and the mass storage device1330, peripheral device(s) 1380, portable storage device 1340, anddisplay system 1370 may be connected via one or more input/output (I/O)buses.

Mass storage device 1330, which may be implemented with a magnetic diskdrive or an optical disk drive, is a non-volatile storage device forstoring data and instructions for use by processor unit 1310. Massstorage device 1330 may store the system software for implementingembodiments of the present invention for purposes of loading thatsoftware into main memory 1320.

Portable storage device 1340 operates in conjunction with a portablenon-volatile storage medium, such as a floppy disk, compact disk,digital video disc, or USB storage device, to input and output data andcode to and from the computer system 1300 of FIG. 13. The systemsoftware for implementing embodiments of the present invention may bestored on such a portable medium and input to the computer system 1300via the portable storage device 1340.

User input devices 1360 provide a portion of a user interface. Userinput devices 1360 may include an alphanumeric keypad, such as akeyboard, for inputting alpha-numeric and other information, or apointing device, such as a mouse, a trackball, stylus, or cursordirection keys. Additional user input devices 1360 may comprise, but arenot limited to, devices such as speech recognition systems, facialrecognition systems, motion-based input systems, gesture-based systems,and so forth. For example, user input devices 1360 may include atouchscreen. Additionally, the system 1300 as shown in FIG. 13 includesoutput devices 1350. Suitable output devices include speakers, printers,network interfaces, and monitors.

Display system 1370 may include a liquid crystal display (LCD) or othersuitable display device. Display system 1370 receives textual andgraphical information, and processes the information for output to thedisplay device.

Peripherals device(s) 1380 may include any type of computer supportdevice to add additional functionality to the computer system.Peripheral device(s) 1380 may include a modem or a router.

The components provided in the computer system 1300 of FIG. 13 are thosetypically found in computer systems that may be suitable for use withembodiments of the present invention and are intended to represent abroad category of such computer components that are well known in theart. Thus, the computer system 1300 of FIG. 13 may be a personalcomputer, hand held computing system, telephone, mobile computingsystem, workstation, server, minicomputer, mainframe computer, or anyother computing system. The computer may also include different busconfigurations, networked platforms, multi-processor platforms, etc.Various operating systems may be used including Unix, Linux, Windows,Mac OS, Palm OS, Android, iOS (known as iPhone OS before June 2010),QNX, and other suitable operating systems.

FIGS. 14A and 14B collectively provide views of an exemplary embodimentof an intelligent assistant device that functions as a base forreceiving a second hardware device, such as a cellular telephone. Itwill be understood that the intelligent assistant device may include anycommunicative interface that allows for one or more devices to interfacewith the intelligent assistant device via a physical connection.

It is noteworthy that any hardware platform suitable for performing theprocessing described herein is suitable for use with the systems andmethods provided herein. Computer-readable storage media refer to anymedium or media that participate in providing instructions to a centralprocessing unit (CPU), a processor, a microcontroller, or the like. Suchmedia may take forms including, but not limited to, non-volatile andvolatile media such as optical or magnetic disks and dynamic memory,respectively. Common forms of computer-readable storage media include afloppy disk, a flexible disk, a hard disk, magnetic tape, and any othermagnetic storage medium, a CD-ROM disk, digital video disk (DVD), anyother optical storage medium, RAM, PROM, EPROM, a FLASHEPROM, any othermemory chip or cartridge.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be coupled with the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

The corresponding structures, materials, acts, and equivalents of allmeans or step plus function elements in the claims below are intended toinclude any structure, material, or act for performing the function incombination with other claimed elements as specifically claimed. Thedescription of the present invention has been presented for purposes ofillustration and description, but is not intended to be exhaustive orlimited to the invention in the form disclosed. Many modifications andvariations will be apparent to those of ordinary skill in the artwithout departing from the scope and spirit of the invention. Exemplaryembodiments were chosen and described in order to best explain theprinciples of the present technology and its practical application, andto enable others of ordinary skill in the art to understand theinvention for various embodiments with various modifications as aresuited to the particular use contemplated.

Aspects of the present invention are described above with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

The flowchart and block diagrams in the Figures illustrate thearchitecture, functionality, and operation of possible implementationsof systems, methods and computer program products according to variousembodiments of the present invention. In this regard, each block in theflowchart or block diagrams may represent a module, segment, or portionof code, which comprises one or more executable instructions forimplementing the specified logical function(s). It should also be notedthat, in some alternative implementations, the functions noted in theblock may occur out of the order noted in the figures. For example, twoblocks shown in succession may, in fact, be executed substantiallyconcurrently, or the blocks may sometimes be executed in the reverseorder, depending upon the functionality involved. It will also be notedthat each block of the block diagrams and/or flowchart illustration, andcombinations of blocks in the block diagrams and/or flowchartillustration, can be implemented by special purpose hardware-basedsystems that perform the specified functions or acts, or combinations ofspecial purpose hardware and computer instructions.

While various embodiments have been described above, it should beunderstood that they have been presented by way of example only, and notlimitation. The descriptions are not intended to limit the scope of thetechnology to the particular forms set forth herein. Thus, the breadthand scope of a preferred embodiment should not be limited by any of theabove-described exemplary embodiments. It should be understood that theabove description is illustrative and not restrictive. To the contrary,the present descriptions are intended to cover such alternatives,modifications, and equivalents as may be included within the spirit andscope of the technology as defined by the appended claims and otherwiseappreciated by one of ordinary skill in the art. The scope of thetechnology should, therefore, be determined not with reference to theabove description, but instead should be determined with reference tothe appended claims along with their full scope of equivalents.

What is claimed is:
 1. A system comprising: an intelligent assistantdevice comprising a processor which executes logic to perform operationscomprising: receiving audio input for generating a speech signal usingat least one microphone communicatively coupled to the intelligentassistant device; and a natural language processor communicativelycoupled with the intelligent assistant device that executes logic toperform operations comprising: receiving the audio input from theintelligent assistant device; converting the audio input from speech toa text query; processing the text query using artificial intelligence(AI) logic; determining an Application Programming Interface (API) froma plurality of APIs for processing the text query; and transmitting aresponse from the API to the intelligent assistant device or anotherdevice communicatively coupled to the intelligent assistant device foroutput.
 2. The system of claim 1, wherein the natural language processorfurther uses machine learning to analyze a string of text.
 3. The systemof claim 1, wherein the natural language processor uses a neural networkto analyze the string of text.
 4. The system of claim 1, wherein thenatural language processor interprets and learns from patterns andbehaviors of a user, attributing data to the patterns and behaviors suchthat a response to a future command from the user can be automaticallygenerated by the intelligent assistant device.
 5. The system of claim 1,wherein the intelligent assistant device acts as a base stationconnected to at least one enabled device such that the audio inputreceived by the intelligent assistant device is used to adjust anoperation of the connected at least one enabled device.
 6. The system ofclaim 5, wherein the at least one enabled device receives data totransmit to or from the intelligent assistant device.
 7. The system ofclaim 5, wherein the at least one enabled device is connected to theintelligent assistant device via Bluetooth.
 8. The system of claim 5,wherein the at least one enabled device comprises a smartphonecomprising: at least one microphone for receiving audio commands; atleast one user input interface; a mobile application for processingaudio and user input commands; and a natural language processor forperforming automatic speech recognition of the audio commands.
 9. Thesystem of claim 5, wherein the at least one enabled device comprises atleast one smart home device.
 10. The system of claim 5, wherein the atleast one smart home device receives commands from a general serverconnected to the intelligent assistant device.
 11. The system of claim1, wherein the intelligent assistant device utilizes digital signalprocessing to separate background noise in the audio input.
 12. Thesystem of claim 1, wherein the intelligent assistant device includesindicators that provide interactive feedback.
 13. A method, comprising:receiving audio input for generating a speech signal using at least onemicrophone communicatively coupled to an intelligent assistant device;transmitting the audio input from the intelligent assistant device to anatural language processor; converting the audio input from speech to atext query using the natural language processor; processing the textquery using artificial intelligence (AI) logic using the naturallanguage processor; determining an Application Programming Interface(API) from a plurality of APIs for processing the text query using thenatural language processor; and transmitting a response from the API tothe intelligent assistant device or another device communicativelycoupled to the intelligent assistant device for output using the naturallanguage processor.
 14. The method of claim 13, further comprisingprocessing the text query using machine learning to analyze a string oftext using the natural language processor.
 15. The method of claim 13,further comprising processing the text query using a neural network toanalyze the string of text using the natural language processor.
 16. Themethod of claim 13, further comprising connecting the intelligentassistant device to at least one enabled device such that the audioinput received by the intelligent assistant device is used to adjust anoperation of the connected at least one enabled device.
 17. The methodof claim 16, wherein the at least one enabled device receives data totransmit to or from the intelligent assistant device
 18. The method ofclaim 16, wherein the at least one enabled device comprises a smartphonecomprising: at least one microphone for receiving audio commands; atleast one user input interface; a mobile application for processingaudio and user input commands; and a natural language processor forperforming automatic speech recognition of the audio commands.
 19. Themethod of claim 16, wherein the at least one enabled device comprises atleast one smart home device that receives commands from a general serverconnected to the intelligent assistant device.
 20. An interactivedevice, comprising: at least one microphone; at least one speaker; and aprocessor that executes logic stored in memory to perform operationscomprising: receiving audio input for generating a speech signal usingthe at least one microphone; transmitting the audio input from thedevice to a natural language processor; receiving a response to theaudio input from a server; and outputting the response.